Second language speech recognition using multiple-pass decoding with lexicon represented by multiple reduced phoneme sets
نویسندگان
چکیده
Considering that the pronunciation of second language speech is usually influenced by the mother tongue, we previously proposed using a reduced phoneme set for second language when the mother tongue of speakers is known. However, the proficiency of second language speakers varies widely, as does the influence of mother tongue on their pronunciation. Consequently, the optimal phoneme set is dependent on the proficiency of the second language speaker. In this work, we examine the relation between the proficiency of speakers and a reduced phoneme set customized for them. We propose a novel speech recognition method which is multiple-pass decoding using a lexicon represented by multiple reduced phoneme sets based on experimental results for speech recognition of second language speakers with various proficiencies. The relative error reduction obtained with the multiple reduced phoneme sets is 26.8% compared with the canonical one.
منابع مشابه
Continuous speech recognition in the WAXHOLM dialogue system
This paper presents the status of the continuous speech recognition engine of the WAXHOLM project. The engine is a software only system written in portable C code. The design is flexible and different modes for phonetic pattern matching are available. In particular, artificial neural networks and standard multiple Gaussian mixtures are implemented for phone probability estimation, and for resea...
متن کاملMultiple Reduced Phoneme Sets for Second Language Speech Recognition
This paper describes a novel method to improve the performance of second language speech recognition when the mother tongue of users is known. Considering that second language speech usually includes less fluent pronunciation and more frequent pronunciation mistakes, I propose using a reduced phoneme set generated by a phonetic decision tree (PDT)-based top-down sequential splitting method inst...
متن کاملMultiple-Pronunciation Lexical Modeling Based on Phoneme Confusion Matrix for Dysarthric Speech Recognition
In this paper, we propose speaker-dependent multiple-pronunciation lexical modeling for improving the performance of dysarthric automatic speech recognition (ASR). For each dysarthric speaker, a phoneme confusion matrix is first constructed from the results of phoneme recognition. Then, pronunciation variation rules are extracted by investigating the phoneme confusion matrix, and they are incor...
متن کاملOne pass cross word decoding for large vocabularies based on a lexical tree search organization
This paper describes the new Philips Research decoder that performs large vocabulary continuous speech recognition in a single pass for cross-word acoustic models and an m-gram language model (with m up to 4) as opposed to our previous technique of multiple passes. The decoder is based on a time-synchronous beam search and a prex tree structure of the lexicon. Cross-word transitions are treated...
متن کاملCan We Use the Linguistic Information in the Signal?
This article discusses the use of phonetic features in automatic speech recognition. The phonetic features are derived from acoustic parameters by means of Kohonen networks. Behind the use of phonetic features instead of standard acoustic parameters lies the assumption that it is useful to help the system to focus on linguistically relevant signal properties. Previous experiments using very sim...
متن کامل